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Pulsar timing analysis in the presence of correlated noise 
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ABSTRACT 

Pulsar timing observations are usually analysed with least-square-fitting procedures 
under the assumption that the timing residuals are uncorrelated (statistically "white" ) . 
Pulsar observers are well aware that this assumption often breaks down and causes 
severe errors in estimating the parameters of the timing model and their uncertain- 
ties. Ad hoc methods for minimizing these errors have been developed, but we show 
that they are far from optimal. Compensation for temporal correlation can be done 
optimally if the covariance matrix of the residuals is known using a linear transforma- 
tion that whitens both the residuals and the timing model. We adopt a transformation 
based on the Cholesky decomposition of the covariance matrix, but the transformation 
is not unique. We show how to estimate the covariance matrix with sufficient accuracy 
to optimize the pulsar timing analysis. We also show how to apply this procedure to 
estimate the spectrum of any time series with a steep red power-law spectrum, includ- 
ing those with irregular sampling and variable error bars, which are otherwise very 
difficult to analyse. 
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1 INTRODUCTION 



Pulsar timing provides a powerful tool for studying a wide 
, range of phenomena ranging from basic physics, such as test - 
' ing the general theory of relativity fe.g.. lKramer et al.ll2006r ). 
through astronomy, for in stance meas uring pulsar positions 
and proper motions (e.g.. lHobbs et al. 2005). to looking for 
' irregularities in te rrestrial time scales fe.g.. |Petit fc Tavellal 
119961 : lRodinll2008l). An overview of the pu lsar timing tech- 
nique is giv en inlLorimer fc Kramen (|2005l ) and details are 
provided in Edwards et al.l ( 20061 ) . In brief, a pulsar tim- 
ing model is used to predict pulse times-of-arrival (ToAs) 
at the observatory. These model predictions are compared 
with the measured ToAs and the differences are known as 
"pulsar timing residuals" . The timing model is subsequently 
improved using a least-squares-fitting procedure to minimize 
the timing residuals. 

Since most of the parameters of the timing model are 
linear, at least for small perturbations, the fitting procedure 
is straightforward and the routines return an estimate of 
each of the parameters and the corresponding covariance 
matrix. It has been the practice in pulsar timing to assume 
that the timing residuals are uncorrelated, although it is 
widely understood that this assumption is usually invalid. 
Correlated timing residuals occur for many reasons, among 
them: inadequate calibration of the raw observations (e.g., 
Ivan Stratenll2006l ): failure to correct for variations in the 
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Figure 1. Timing residuals for PSR J1939+2134. Panel (a) shows 
the residuals from a standard pulsar timing model fit for the spin- 
frequency and its derivative. Panel (b) shows the residuals after 
also fitting for the position of the pulsar and a phase jump be- 
tween Arecibo and Parkes observations using the simple weighted 
least squares method. Panel (c) shows the residuals after the same 
fitting as in panel (b) using the Cholesky method. The effect of 
changing i/ by icr is illustrated in panels (a) and (c) by the dashed 
lines. 
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interstellar dispersion (e.g.. IYou et al.ll2007^ : and "timing 
noise" intrinsic to the pulsar. Timing noise is still not fully 
understood, but usually refers to unex plained low- frequency 
features in the residuals (e.g.. iLvne et al. 2010 ). The main 
effects of neglecting correlation in the timing residuals are: 
(1) the parameters of the timing model are not estimated 
as accurately as possible and may have systematic biases; 
and (2) the uncertainties on the best-fit parameters are not 
correct. 

An extreme example of the problem can be seen 
in the 20-cm timing residuals for the millisecond pulsar 
J1939-I-2134 assembled by Verbiest et al. (2009). The first 
eight years o f these observation s were taken at the Arecibo 
Observatory (IKaspi et al.lll994 ) and the remaining observa- 
tions were taken at the Parkes Observatory. There is an un- 
known phase discontinuity between the observations at the 
two observatories. An initial estimate of this "jump" was 
made by fitting the observations for only a year on either 
side of the jump with a model including the pulse frequency 
V, its first derivative v, and the jump. The fitting of u and 
u removes a quadratic from the residuals, which in this case 
makes them almost white and allows for a reasonable initial 
estimate of the jump. The residuals are displayed in Figure[T] 
panel (a) after a weighted least squares (WLS) fit for u and 
z> using the initial jump estimate. Fitting v and over the 
longer data span leaves an obvious cubic term in the residu- 
als. In panel (b) the residuals are shown after also fitting for 
the position of the pulsar and the jump using WLS over the 
entire data span. The jump fitting over the entire data span 
is obviously catastrophic. An offset in position introduces 
an annual sine wave into the post-fit residuals. Ideally the 
fitting process would minimize the power in the spectrum 
at 1 y~^, but the high degree of correlation in the residuals 
introduces such a large error in the estimated position that 
it actually adds power at 1 y~^. This shows as a distinct an- 
nual ripple in panel (b). Panel (c) shows the result of fitting 
the same parameters over the same data span as panel (b) 
using the new method that we describe in this paper. The 
new method, which we refer to as the Cholesky method, ob- 
viously fits the position and jump much better than panel 
(b) but there is an obvious trend in panel (c). This is because 
the estimate of v in panel (c) is quite different from the one 
in panel (a). In panel (a) some of the red noise is absorbed 
into the v estimate. The error in the trend is much larger 
than one would expect, due to the red noise. The effect of 
changing v by ±a is illustrated in panels (a) and (c) by the 
dashed lines. One can see that the trend in panel (c) is well 
within the uncertainty of the v estimate. The estimation of 
u and u will be discussed in more detail in section 5.3. 

Another example of this problem is a comparison of 
the parallax of PSR J0437— 4715 as estimated from a 
timing analysis (jVerbiest et al.l 120081 ). with that estimated 
from recent Very Long Baselin e Interferometric (VLBI) 
observations (JDeller et all 120081 ). The VLBI parallax is 
6.396±0.054 mas. The timing parallax estimated using a 
WLS fit was 6.65±0.07mas and we were able to dupli- 
cate this using the same observations. We obtain a value 
of 6.34±0.12mas using the Cholesky method. Clearly the 
Cholesky result is consistent with the VLBI result and the 
WLS method is not. It should be noted that Verbiest et 
al. (2008) were doubtful of the formal error estimate for the 
WLS method and they re-estimated the error using a Monte 



Carlo simulation which increased the error from 0.07 mas to 
0.51 mas. As the actual difference is 0.25 mas this revised 
error estimate may have been conservative. It is quite com- 
mon for pulsar observers to be suspicious of the error esti- 
mates obtained using WLS fits and to modify them in var- 
ious ad hoc ways. We will show that using the Cholesky 
method eliminates this problem for all parameters of the 
timing model with time scales shorter than the observing 
span. V and v always have time scales comparable with the 
observing span and will be discussed separately in section 
5.3. 

Pulsar observers have often attempted to improve pa- 
rameter estimates by removing some portion of the low- 
frequency timing noise, taking care not to remove the com- 
ponents that are needed to estimate the parameters of inter- 
est. The low frequencies have been removed by adding them 
to the timing model, either as a high order polynomial (e.g., 
iThorsett et al.lll999l ) or as a carefully chosen Fourier series 
(e.g., FITWAVES in Tempo2; Hobbs et al. 2004). In either 
case the residuals are "flattened" or "whitened" and an ade- 
quate fit for position can be obtained (throughout this paper 
timing residuals that are uncorrelated are termed "white" 
and residuals that exhibit a steep low-frequency spectrum 
are termed "red"). Neither method produces a good fit to 
a phase jump because the effect of a phase jump is not lo- 
calised in frequency in the residual spectrum. 

In this paper we describe a method of optimizing the 
least-squares fit by finding a linear transformation which 
whitens and normalizes the residuals. This transformation is 
then applied to both the observations and the timing model. 
The parameters can then be found by fitting the transformed 
timing model to the transformed observations using ordinary 
least- squares. The transformation can be found exactly if the 
covariance matrix of the residuals is known, although it is 
not unique. This process is equivalent to the so-called "gen- 
eralized least-squares" (GLS) solution, indeed it is how that 
solution was discovered. This method is not new, but it has 
not been applied to pulsar timing observations before. The 
GLS solution provides the best linear unbiased estimator of 
the parameters of the timing model and the best unbiased 
estimator of the covariance matrix of the estimated param- 
eters. In most cases the covariance matrix of the residuals 
is not known and must be estimated from the observations. 
We have developed a procedure for estimating the covariance 
matrix which works well for the pulsars we have tested and 
we find that the parameter estimates are not unduly sensi- 
tive to errors in estimating the covariance matrix. It should 
be noted that the residuals formed by subtracting the tim- 
ing model with the best fit parameters from the observations 
will not be white as can be seen in Figure 1 panel (c). The 
advantages of the Cholesky method over the polynomial or 
Fourier methods are: (1) it provides more accurate parame- 
ter estimates; (2) it provides a more accurate estimate of the 
covariance matrix of the parameters; (3) it does not require 
any "adjustment" to fit different parameters of the timing 
model. 

In section 2 we outline the theory of linear least-squares 
fitting when the covariance matrix of the residuals is known. 
When the covariance is not known one must usually attempt 
a spectrum analysis of the residuals. In section 3 we discuss 
the problem of spectrum analysis of steep red random pro- 
cesses which are sampled irregularly and each sample has 
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a different uncertainty. In section 4 we describe our proce- 
dure for estimating the covariance matrix of the residuals by 
spectrum analysis and demonstrate it on two pulsars with 
different types of timing noise. In section 5 we show that the 
Cholesky method is consistently superior to the WLS, poly- 
nomial, and Fourier methods, using simulations to make the 
comparisons more precise. Finally in section 6 we use sim- 
ulations to establish the sensitivity of the Cholesky method 
to errors in estimating the covariance matrix. The tools 
that we discuss ha ve been incorporated into the Tempo2 
l|Hobbs et al.l 120061 ) timing analysis package and are avail- 
able on-linqj for the community. A step-by-step tutorial on 
using the Cholesky method is also available on the Tempo2 
web site. 



2 LINEAR LEAST SQUARES THEORY 

The least-squares formalism separates the observations into 
a deterministic component, the timing model, and a (zero 
mean) random component. The random component is re- 
ferred to in the statistical literature as the "error" and in 
the pulsar community as "post-fit timing residuals". The 
least-squares is linear if the timing model is a linear function 
of the parameters which must be estimated. The formalism 
can be compactly described using matrix algebra as follows. 
Here n timing residuals are modelled using a system of linear 
equations in m < n parameters. 



R^MP+E 



(1) 



where R is an n-point column vector representing the pre-fit 
timing residuals, M is an n x m matrix describing the pulsar 
timing model, P is an m-point column vector containing 
the fitted parameters and E is an n-point column vector 
representing the post-fit residuals. If the post-fit residuals 
are independent with equal variance (a^) for all observations 
(homoscedastic), then one minimises the squared error. 
In vector notation the squared error is 



E^E ={R- MPf{R - MP) 



(2) 



and the solution, which is called ordinary least squares 
(OLS), is 



Past = {M^My^M^R. 



(3) 



If the residuals have a gaussian distribution this is also 
"maximum likelihood solution". The covariance matrix of 
the parameters is given by 



cov{Pa, 



(PcstPL) = a (M^M)- 



(4) 



Here the angle brackets denote a statistical expectation. 

The normalised squared error EF E/a^ is a chi-squared 
random variable with n — m degrees of freedom. It is often 
referred to as x^ and used as part of a "goodness-of-fit test." 
It is the normal practice to scale cov{Past) by X^ /{n — m) 



at least when x^ > {n - 



If x < {n — m) there may be 



a problem. Either the data may have been "overfit" or a 
may have been underestimated. If a^ is unknown a priori, 
then one estimates it using 



E^E/(n~m). 



http://www.atnf.csiro.au/research/pulsar/teinpo2 



(5) 



If the residuals are white, but each ToA has a different 
variance (heteroscedastic), then one minimises the weighted 
squared error (the WLS solution). This approach is widely 
used in pulsar timing analysis. For white residuals, the co- 
variance matrix of the residuals is V a diagonal matrix with 
the variance of the samples on the main diagonal. In this 
form the weighted squared error is E^W'^E and the solu- 
tion is 



Pest = {]sJVY'^M.y^yVW'^R. 



(6) 



The same result can be obtained by normalizing both the 
data and the model by multiplying each by V^"'^. That is 
Rn = V^^-^P and Mn = V'^'^M. Then the errors will aU 
have unit variance and the solution is 



Pest = (Mn^Mn)"'^" ^' 



Mn'P. 



•AT. 



(7) 



Thus the WLS solution becomes an OLS problem in the nor- 
malized variables. If the normalization is correct the mini- 
mum squared error En En will have a x distribution with 
n — m degrees of freedom. The covariance matrix of the pa- 
rameters is ix^/in — 77i))(Mn"^Mn)~^. It must be scaled 
by the measured x^ to allow for errors in the normalization. 
The OLS and WLS solutions have been known since 
the work of Gauss and Legendre at the beginning of the 
19th century. The WLS analysis can be extended to the 
case of correlated residuals if the covariance matrix of the 
residuals C = {EE'^) is known. In this case one minimises 
E'^C~^E and the solution (Aitken, 1935) is often referred 
to as generalised least squares (GLS). 



(M^C^^M) 



"^M'^C'^P. 



(8) 



The GLS solution was derived, and is best described, as 
a normalizing and whitening process. C is Hermitian pos- 
itive semi-definite and so can be factored into C — UU'^ 
using a Cholesky lower triangle factorisation. The matrix 
U^"^ can be used as a normalizing and whitening transfor- 
mation (but is not unique, the Mahalanobis transformation 
can also be used). Defining i?„ — \J~^E, Pw = U~^P and 
Mw ~ U~^M, we find that cov(Ew) ~ I, the identity ma- 
trix. The transformed residuals E^ are therefore both white 
and normalized to unit variance. The GLS solution is now an 
OLS problem in the transformed variables, and Equation (8) 
can be rewritten as 



Pest = (M^Mw)-'M^P. 



(9) 



^ T ^ 

-Ew E^ 



Again if the normalization is correct x — ^w E^ is 
a x^ random variable with n — m degrees of freedom 
and the covariance matrix of the parameters is {x^ /{n — 
m))(M„^M„)-\ 

In summary, all linear least-squares problems can be 
reduced to ordinary least-squares by a suitable transforma- 
tion. However, to find this transformation we must know the 
covariance matrix of the residuals at least within a constant 
multiplier. We will defer until section 4 the discussion of how 
to estimate the covariance matrix from the observations. The 
least squares solution, when properly transformed, provides 
minimum variance, linear unbiased estimators of the model 
parameters and their uncertainties. It should be noted that 
inverting matrices of the form (M"^M) directly is not com- 
putationally efficient and other algorithms should be em- 
ployed. Tempo2 uses a singular value decomposition. 
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Equations (8) and (9) are the same, so it does not mat- 
ter which form one uses. However one must, in most cases, 
first estimate the covariance matrix and confirm that it is 
correct. This requires the Cholesky (or equivalent) transfor- 
mation. So the transformation must be found and appUed in 
any case. In practice we find it efficient to use an OLS solver 
for all cases so we first form Mw and -Rw then use equation 
(9). 



3 LEAST SQUARES SPECTRAL ANALYSIS 

Spectral analysis is intimately connected to the analysis of 
pulsar timing observations in two ways. First, the timing 
model includes a number of parameters which control the 
amplitude and phase of sine waves, so fitting for those pa- 
rameters amounts to a spectral analysis. Second, optimal 
least squares fitting requires estimation of the covariance 
matrix and this will require some form of spectral analy- 
sis. Here we provide a very brief discussion of the aspects 
of spectral analysis important to the analysis of pulsar tim- 
ing observations. We will not attempt to provide a complete 
mathematical description because there are many books de- 
voted to the subject (e.g., Blackman and Tukey, 1959, Jenk- 
ins and Watts, 1969; Manolakis, Ingle and Kogon, 2005). 

Spectral analysis of a series of regularly spaced obser- 
vations of length T is normally performed via the discrete 
Fourier transform (DFT). The squared magnitude of the 
DFT, properly scaled, known as the periodogram P{f), is 
often used as the spectral estimator. Here the sample inter- 
val is 5 and we normalize P{f) to have the units of a power 
spectral density. 



P(i=k/T) = (T/n ) 



n-l 

E 

1=0 



r{t = I5)e-'^-'""^ 



(10) 



However, P(f) will be biased if the spectrum being es- 
timated is not white. The bias is caused by the finite length 
of data. The data are effectively multiplied by a time win- 
dow w{t), which is often rectangular (i.e., it is unity dur- 
ing the observations and zero otherwise). Thus the Fourier 
transform of r{t)w{t) is the convolution of R{f) with W{f). 
The measured power spectrum P{f) is the convolution of 
the spectral density S{f) with |H^(/)p. For a rectangular 
window of length T the spectral window is given by 



|M/(/)|2 = |sin(7r/r)/(7r/r)|2 



(11) 



This window is most widely used because it provides the 
highest frequency resolution. This is particularly important 
in pulsar timing analysis. The sidelobes of this window fall 
off as f~^. This makes power law spectra which fall faster 
than f^^ heavily biased because window sidelobes will dom- 
inate all frequencies higher than / — 1/T. Steeper spectra 
must be analysed with "prewhitening" and "postdarken- 
ing" to minimize spectral leakage (e.g., Jenkins and Watts, 
1969). One applies a linear pre- whitening filter which is im- 
plemented in the time domain. The purpose of this filter is 
to make the spectrum close enough to white that leakage is 
insignificant. For example, if a;(fc) is the input and y{k) is the 
output, a first difference filter is y{k) — x{k) — x{k — 1). The 
effect of this filter is to multiply the Fourier transform of the 



input by the transfer function H[f) = 2sin(7r/(5). This mul- 
tiplies the power spectrum by \H{f)\^ — (2sin(/5))^, which 
has an /^ behavior at low frequencies. Thus it will whiten a 
power law spectrum with exponent —2. The whitened data 
can be analyzed with little spectral leakage if its spectral 
exponent is between -0.5 and -3.5. We then correct the es- 
timated spectrum of the whitened data by dividing it by 
|/f(/)p. This is called post-darkening. 

For steeper spectra a second difference (two applica- 
tions of the first difference) will be required. The transfer 
function of the second difference is H{f)^. Unfortunately 
spectra often are power law at low frequencies, descending 
into white noise at high frequencies. The white noise will be 
transformed to /* behavior and spectral leakage can occur 
backwards from high frequencies to low frequencies. To pre- 
vent this one must also filter the original observations with a 
low-pass filter, implemented in the time domain, which has 
a transfer function of the form L{f) = 1/(1 -|- {f / fcf')- Here 
the corner frequency /c is chosen to be near the frequency 
at which the power law component reaches the white noise 
level. The residuals after second differencing and low pass 
filtering are then spectrum analyzed. The post-darkening is 
done by dividing the estimated spectrum of the filtered data 
by \LU)\'\H{f)\\ 

Since we do not know the spectral exponent a priori, the 
analysis must be done in stages. A first estimate can be done 
with a periodogram. If it appears to be limited by leakage 
then another analysis must be done using first difference 
pre-whitening and post-darkening. If it still appears to be 
limited by leakage one must do another analysis using second 
differencing and low pass filtering. In some cases a more 
complex pre-whitening filter may be required, but we have 
not seen any such cases in pulsar timing analyses. 

We can apply pre-whitening and post-darkening, as 
described above, only to regularly sampled measurements 
because the temporal filtering operations requires equally 
spaced data. Fortunately this is possible for pulsar timing 
applications because the low frequency spectra are often 
power-law and the high frequencies are white. We can use a 
low pass filter to separate the low and high frequencies. The 
low frequencies are heavily over-sampled and can be inter- 
polated onto a regular grid without much bias. The high fre- 
quencies cannot be interpolated, but as they are white they 
will not suffer from spectral leakage and we can use existing 
least-squares methods. Thus we can assemble a composite 
spectrum by splitting the spectrum with a low pass filter 
and analyzing the two halves with different methods 

The DFT is equivalent to a least squares fitting of com- 
plex exponentials, but since the complex exponentials are 
orthogonal in the regularly sampled DFT, they can be fit 
either individually or simultaneously and the result will be 
the same. If the sample spacing is not regular the complex 
exponentials are not orthogonal. It is not possible to fit them 
all simultaneously because they are highly covariant. How- 
ever we can obtain unbiased estimates of the variance at each 
frequency using the Lomb-Scargle L-S (Scargle, 1982) or Z-K 
(Zechmeister and Kurster, 2009) methods. This is sufficient 
for our purposes. These methods are essentially least-squares 
fits of a sine-cosine pair independently at each frequency, and 
scaled so the expected value of each component is the same 
if the input time series is white noise. The LS method is un- 
weighted and the Z-K method is weighted. These methods 
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also have window functions W{f) but their windows have 
much higher sidelobes than does sm{-K fT) / (ji fT) . 

If the covariance matrix is known, we can estimate the 
spectrum of irregularly sampled data using the Cholesky 
least squares method. Of course if the covariance matrix 
were known exactly we would know the spectrum and would 
not need to do a spectrum analysis. Fortunately we do not 
need to know the covariance matrix exactly, for the same 
reason we do not need to do the prewhitening exactly, it is 
only necessary to whiten the spectrum enough to eliminate 
leakage. If the window sidelobes are lower than the main 
lobe by a factor of 10, then one can tolerate local varia- 
tion in the spectrum of approximately a factor of 10. The 
Cholesky method essentially extends the L-S or Z-K meth- 
ods by fitting a sine-cosine pair at each frequency, but first 
whitening both the data and the model (the sine-cosine pair) 
with the Cholesky transformation. 

We show some examples of least-squares spectrum anal- 
ysis in Figure 2. Here we simulated a steep red spectrum with 
white noise under two sampling regimes. This was done in- 
dependently of TEMP02. We simulated the red component 
in the Fourier transform domain creating a daily sampled 
time series 100 times longer than the actual data span. We 
interpolated this to the desired sample times using a con- 
strained cubic spline interpolator. Then we added the sam- 
pling noise. This provided 100 realisations of the process. In 
panel (a) we used regular sampling with equal errors and in 
panel (b) we used the actual sampling of PSR J1713-I-0747 
and the corresponding errors from the Verbiest et al. (2009) 
data set. In both cases we fitted and removed a quadratic, 
as would always be done with pulsar timing observations. 
The spectra of 100 realisations of the random process have 
been averaged to reduce the estimation errors. In each case 
we have shown the actual spectrum as a short dashed line. 
The Z-K spectral estimates are shown as long dashed lines. 
In the upper panel the sampling is regular and the weights 
are constant so the DFT estimate, the Z-K estimate, and 
the L-S estimate are the same. One can see that the Z-K 
estimate suffers from very serious spectral leakage and the 
quadratic removal has reduced the first spectral estimate by 
a factor of about 10. In the lower panel the Z-K estimate 
shows even stronger spectral leakage, but quadratic removal 
has not reduced the first spectral estimate as severely as in 
the upper panel. In both panels the Cholesky spectral esti- 
mates are shown as solid lines. In the upper panel this is an 
excellent match to the actual spectrum. In the lower panel 
the shape is correct but the Cholesky estimate is about a fac- 
tor of two higher than the actual spectrum. Here the known 
spectrum was used to find the covariance matrix and the 
Cholesky transformation. Of course this is not possible with 
observations. We will discuss the iterative approach we use 
for observations in the following section. 

One can see that in both panels the noise level at f 
= 1 y~^ with the Cholesky estimate is much lower than 
with the Z-K estimate. The uncertainties on an estimate of 
position or proper motion would be correspondingly lower. 
In panel (a) one can see the window sidelobes of the WLS 
estimator fall like f~^ as expected. In panel (b) the side- 
lobes do not continue to fall with increasing /. This is be- 
cause they are dominated by the effects of irregular spac- 
ing and variable errors. This window is equivalent to the 
"dirty beam" in aperture synthesis with a sparsely sampled 




Frequency (y ) 



Figure 2. Simulated power spectra. Each spectrum is the average 
of 100 realisations. The siiort dashed lines show the theoretical 
spectra. The long dashed lines are the WLS spectral estimates 
and the solid lines are the Cholesky spectral estimates. Panel (a) 
shows the spectra for regular sampling and equal weights. Panel 
(b) shows the spectra for irregular sampling and variable weights. 



aperture. In panel (a) the excellent agreement between the 
mean Cholesky spectrum and the actual spectrum shows 
that the Cholesky method is effective in eliminating leak- 
age. We believe that this is the first demonstration of the 
Cholesky transformation for this purpose. In panel (b) the 
mean Cholesky spectrum appears to be biased high by a 
factor of approximately 2. We think that this is due to the 
integrated effect of the sidelobes of the dirty beam. We have 
not (yet) found an analytical way to remove this bias, but 
we believe that it could be calibrated out using simulations. 
Fortunately it is not necessary to remove it for least-squares 
fitting of a timing model because the y^ factor will absorb 
any constant factor, provided that the shape of the spectrum 
is correct. 

We have also computed the covariance matrices of the 
Cholesky spectral estimates and find that they are essen- 
tially independent, whereas those made using WLS are 
highly dependent on the first spectral estimate. This is a 
very important point because it makes optimum detection 
of a power-law process possible in the frequency domain. 
In this case the normal procedure would be prewhitening 
followed by low-pass filtering and then estimation of the 
variance and possibly cross-covariances. It is equivalent to 
Wiener filtering. If one Fourier transforms the residuals then 
spectra and possibly cross-spectra can be estimated. Each 
spectral estimate for which the signal is greater than the 
noise will provide two degrees of freedom. If the spectral es- 
timates can be made independent then a weighted sum of 
the spectral (and cross-spectral) estimates is exactly equiv- 
alent to the optimal Wiener filter. 
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4 COVARIANCE MATRIX ESTIMATION 

The covariance matrix of n observations contains n{n — l)/2 
ternrs so it is not feasible to estimate the entire matrix from 
the n observations without a simpUfying statistical model. 
Here we assume that the timing residuals can be modeled 
as the sum of two random processes: the correlated timing 
noise x{t), and the uncorrelated measurement error e(f). We 
assume that x{t) can be modeled as a wide-sense stationary 
process with a covariance function c(t) — {x(t)x(t + r)). 
Its power spectrum P{f) is the Fourier transform of c{t). 
The covariance matrix for x{t) is C where each element 
Cij = cijij) and Tij = \ti — tj\. The timing residuals are 
sampled irregularly and e(t) is a point process consisting of 
uncorrelated measurement errors which have different vari- 
ance at each sample time. Its covariance matrix is a diago- 
nal matrix V, where the components of the diagonal are the 
measurement variances for each sample. The overall covari- 
ance matrix is simply the sum of V and C. 

There must be cases where the assumption that the 
timing noise is wide-sense stationary breaks down since the 
causes of timing noise are incompletely understood. This 
would be a very interesting area of further research. There is 
also a subtle but important difference between the statistics 
of the pre-fit and post-fit residuals. Even if the pre-fit resid- 
uals are wide-sense stationary the post-fit residuals will not 
be. Since we actually have to estimate the covariance matrix 
of the post-fit residuals, there is always some break-down in 
this assumption. Fortunately the effects of such break-downs 
can be estimated through simulations as will be discussed 
in sections 5 and 6. 

The observed TOAs are always accompanied by uncer- 
tainty estimates o-e(i), but these are generally insufficient to 
fully characterise the white noise. The tJe (i) are derived from 
the process of fitting a scaled and shifted template to the 
mean pulse shape. They will correctly describe the timing 
uncertainty caused by additive white gaussian noise, such 
as radiometer noise, but there are other potential sources of 
noise which can cause an uncertainty in the apparent pulse 
time of arrival. Observers often find that the "scatter" in the 
residuals appears to be larger than the error bars by a factor 
which is typically about two. This factor can be quite differ- 
ent for different receiving systems. The "fixData" plugin to 
Tempo2 can be used to account for this unexplained scatter 
by increasing the ae{i) estimates. The plugin uses the mean 
structure function on short time lags as an estimate of the 
high frequency rms residual, and scales cre(i) so it becomes 
independent of the receiving system. When the errors have 
been properly scaled computation of V is straightforward. 

The estimation of the covariance matrix C of red timing 
noise is not trivial, but it is easy to check that the trans- 
formed residuals are actually white and normalized. This 
is best done by estimating the spectrum using the L-S al- 
gorithm which is unbiased for such a random process. So 
we use an iterative approach: estimating the spectrum; fit- 
ting it with a parametric model; using that model to find 
the covariance matrix; finding the Cholesky transformation; 
transforming the data; and estimating the spectrum of the 
transformed data with the L-S algorithm. This spectrum 
should be white. If it is much fiatter than the original, as 
it always has been during our testing, we use this trans- 
formation to estimate the spectrum of the data using the 



Cholesky least squares method. From this estimate we ob- 
tain improved models of the spectrum; the covariance ma- 
trix; the transformation matrix, and ultimately a further 
improved estimate of the spectrum. As noted earlier, it is 
not essential that the whitened data have a perfectly flat 
spectrum, only that it be sufficiently fiat that spectral leak- 
age is not dominant. The first spectral estimate is obtained 
in two different ways, depending on whether the red noise in 
the residuals is "weak" or "strong". The entire process can 
be tested by simulation, including the effect of transforming 
the data with the wrong transformation matrix. 



4.1 Weak Red Noise 

The uncertainty in any estimated covariance function, 
such as c{t) for the timing noise, has the form Sc ~ 
iro/Tots)°-'^c{0) where c(ro) = c(0)/2 (Jenkins and Watts 
1969). By comparison the uncertainty in estimating a white 
noise variance a^ has the form Sa^ = a^/n^'^. Since the 
time scale to for red timing noise can approach Tabs the er- 
ror in the covariance matrix can be large. However in cases 
where tq <g Tots and the variance of the timing noise is com- 
parable with the variance of the white measurement error, 
one can estimate c{t) directly with adequate accuracy. One 
simply subtracts the mean and sums the pairwise products, 
averaging them into suitable "r bins" weighted by their esti- 
mated uncertainties. For 19 of the 20 millisecond pulsars in 
the Parkes Pulsar Timing Array this approach gives a good 
estimate of c{t). The exception, PSR J1939-(-2134, which 
has TO « Toba, is shown in Figure 1. Irregular sampling is 
not a problem for any of the other pulsars in the Verbiest et 
al (2009) data set for which tq ^ Tabs, because the distri- 
bution of time differences r between sample pairs, is much 
more uniform than the distribution of sample times. 

In such cases we have found that an exponential model 
c{t) — c(0) exp(— |r/To|), providing an amplitude and a time 
scale, is sufficient. The exponential model is perhaps the 
simplest physical model, a single time-constant, and it has 
smooth behavior in the frequency domain, i.e., Px{f) ~ 
2c(0)ro/(l -I- (27r/ro)^). We fit this model to c{t) to ob- 
tain c(0) an d rp. The residuals f or the millisecond pulsar 
J1045-4509 l|Verbiest et aLlbOOgl ). which provide an exam- 
ple of this, are shown in Figure 3(a). One can see that there 
is low frequency noise present, but it is not dominant. The 
estimated covariance c{t) is shown in Figure 3(b). Here the 
variance of the residuals is marked with an 'o' symbol on 
the y-ajds for comparison with the low frequency variance 
c(0). The best fit exponential model is shown as a solid line. 
The time scale is clearly much less than the data duration, 
and the variance in the white noise is greater than that in 
the red noise. The OLS spectrum of the residuals whitened 
using the Cholesky method is shown in Figure 3(c). It is 
white within the estimation errors. 



4.2 Strong Red Noise 

The direct estimation of c{t) breaks down when the low- 
frequency variance in the timing noise is dominant. In this 
case the residuals show a slow variation that substantially 
exceeds the error bars, as shown in Figures 1(a) and 4(a). In 
most such cases the power spectrum has the form P{f) = 
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Figure 3. Analysis of PSR J1045-4509. Panel (a) shows the 
residuals from a standard pulsar timing model fit for u and u. 
Panel (b) shows the estimated covariance function of the residu- 
als. The variance, which includes the white noise, is marked with a 
circle on the y axis. The best- fit exponential model is drawn as the 
solid line through the points which are the covariance measure- 
ments averaged into logarithmic bins. Panel (c) shows the OLS 
power spectrum of the residuals whitened and normalised with 
the Cholesky transformation. The horizontal dotted lines are the 
expected mean and 95% confidence intervals for normalised white 
noise with this sampling. 



Af^" where a > 2. It is hard to estimate c{t) because there 
are few degrees of freedom (i.e., tq ~ Tots)- However it is 
usually possible to make a power-law model of the spectrum 
and to specify the amplitude of that power law with rea- 
sonable accuracy. This is because each spectral estimate for 
which P{f) is greater than the white noise, can provide two 
degree of freedom. As noted earlier, one must use a spectral 
estimator which provides independent estimates of P{f)- 

Steep red processes require whitening, but we do not 
yet have the covariance matrix so we have to obtain a spec- 
tral estimate iteratively. We start by low-pass filtering the 
residuals to separate the red and white components. The 
resulting red component can be interpolated on to a regu- 
lar grid without nmch distortion because it is quite smooth 
after the low-pass filtering. We can then pre-whiten it with 
a first difference process, compute the |DFTp, and post- 
darken the result. This generally gives an adequate "first 
guess" at the power spectrum of the red component. We 
estimate the white component by subtracting the red com- 
ponent from the original residuals. We find the spectrum of 
the white component with the Z-K weighted least squares 
estimate. The next step is to fit a power-law model of the 
form Pm{f) = A/ {I + {JlJcfT''^ to the red spectral esti- 
mate for the frequency range below the frequency at which 
the red and white spectra cross over. We find that the "cor- 



ner frequency" /c should not be /c < l/Tots because even if 
the red noise is a pure power law, fitting v and v will flatten 
the spectrum of the residuals below this frequency. We then 
compute c(t) by Fourier transformation of Pm{f) and fi- 
nally obtain the covariance matrix as discussed earlier. This 
covariance matrix is used to re-estimate the power spectrum 
using the Cholesky least squares procedure. This spectrum 
is used to revise the model Pm{f), a new c{t) is found, a 
new covariance matrix, and a new Cholesky estimate of the 
power spectrum. At this point, the power-law model, the 
covariance matrix and the power spectral estimate are self- 
consistent. We then check to see that the whitened resid- 
uals look white, by computing their power spectrum using 
an OLS procedure (because they should be both white and 
normalized). 

These steps are illustrated in Figure 4 for the pul- 
sar J1539— 5626. The residuals obtained from the Parkes 
analogue filterbank (Manchester et al. 2001) are shown in 
the top panel (a). In the second panel (b) we show the 
\DFTf power spectrum of the red component as a jagged 
solid line obtained by first difference pre-whitening and 
post-darkening. The power-law model Pm{f) is shown as 
a smooth solid line, and the WLS spectrum of the white 
component is shown dotted. In the third panel (c) we show 
the the Cholesky spectrum as a jagged solid line and the fi- 
nal model Pm{f) as a smooth solid line. The original model 
is shown as a dashed line, and the WLS spectrum of the 
white component is shown exactly as in panel (b) for com- 
parison. One expects to see the Cholesky spectrum merge 
into the spectrum of the white component and this does in 
fact occur. Finally in the lowest panel (d) we show the OLS 
spectrum of Rw, the Cholesky-transformed residuals. The 
mean and 95% confidence limits for a unit variance white 
spectrum are shown as horizontal dashed lines. One can see 
that the transformed residuals are in fact quite consistent 
with white noise. 

We have provided options in Tempo2 to perform this 
iteration with no differencing, first-order differencing or 
second-order differencing. Although the differencing is only 
used to get a first-guess of the spectral model, we prefer to 
use the least order of differencing if the spectra are simi- 
lar with two different orders. For example PSR J1539— 5626 
can be analyzed either with no differencing or first-order dif- 
ferencing, the results are similar. By comparison the anal- 
ysis of PSR J1939+2134, shown in Figure 1, requires first 
or second-order differencing. If the exponent of the spectral 
model was steeper by more than a few tenths with a higher 
order differencing, we would choose the higher order. How- 
ever it is always necessary to low-pass filter the residuals 
and interpolate them onto a regular grid. We perform the 
low-pass filter by convolving the residuals with a weighted 
exponential smoothing function of the form exp(— |f/rs|) 
with a time constant Ts ~ 20 days. The low pass filter re- 
sponse is 0.25 at / = (27rrs)~^. The smoothing time can 
be changed and should be adjusted so that the bandpass 
roughly matches the intersection of the red and white spec- 
tra. In figure 4 the intersection frequency is 1/180 days, so 
Ts should be ~ 180/27r. The results of this convolution are 
sampled at the original sampling times. They are then inter- 
polated onto a regular grid using a cubic spline constrained 
so its step response does not overshoot (Fritsch & Carlson 
1980). The white component is found by subtracting the red 
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Figure 4. Analysis of PSR J1539— 5626. Panel (a) shows the re- 
sulting residuals from a standard pulsar timing model fit for the 
spin-frequency and its derivative. Panel (b) shows the spectrum 
of the smoothed and interpolated red component formed using 
the first difference pre-whitening method as a solid jagged line. 
The WLS spectrum for the white component is shown dotted. 
The power law model Pm{f) fit to the red component is shown 
as a smooth solid line. Panel (c) repeats the WLS shown dotted 
in panel (b) and the first model Pm(f) also shown as a solid line 
in panel (b). The Cholesky spectrum is shown as a solid (jagged) 
line. The revised model Pm{f) fit to the Cholesky spectrum is 
shown dashed. Panel (d) shows the OLS spectrum of the whitened 
and normalised residuals with the mean and 95% confidence in- 
tervals shown as dotted lines. 



component, evaluated at the original sample times, from the 
original residuals. Further details of this process are given 
on the Tempo2 web page. 



5 PERFORMANCE COMPARISON 

The performance of the various fitting algorithms can be 
compared by simulation of observations of a pulsar with 
known parameters and added noise with known statistics. 
We expected the various algorithms to be unbiased because 
least-squares algorithms perform well in this respect, but we 
found a serious bias in the fitwaves algorithm which we will 
discuss in section 5.1. The primary performance measure is 
the rms variation in the parameter estimates found by re- 
peating the same simulation many times. It is also important 



that the algorithm return estimate of the uncertainties in the 
parameters which agree well with the actual rms variations. 
The parameters of the timing model have different ef- 
fects on the residuals. Some of them are essentially time 
harmonic: the position and proper motion parameters ad- 
just the amplitude and phase of an annual sine wave; the 
parallax does the same for a biannual sine wave; and the 
binary parameters adjust sine waves at harmonics of the bi- 
nary period. The spin frequency parameters u and v adjust 
the linear and quadratic polynomial coefficients and their 
effects are confined to frequencies / < 1/Tobs- Jumps are 
Heaviside step functions having a power spectrum of the 
form 1/f^. Thus it is difficult to remove the timing noise us- 
ing the polynomial or Fourier schemes without distorting the 
jump. We simulated a four-dimensional test matrix: different 
algorithms; different sampling; different timing noise; and 
different parameter types. We tested four algorithms: WLS, 
Cholesky, polynomial, and Fourier; two sampling schemes, 
regular and irregular; two noise types, weak red and strong 
red; and three parameter types, time-harmonic, broadband, 
and polynomial. The weak red noise was simulated with 
an amplitude of ^ = 1 x 10~'^''y^, a corner frequency of 
fc = 0.3 y~^ and a spectral exponent a — 2.5. The strong 
red noise had A = 1 x 10"" y^, fc = 0.01 y"^ and a = 5.5. 
The irregular sampling was taken from actual observations 
of PSR J0711-6830 that contains 225 points over 14.2 y. 
The regular sampling had the same number of points sam- 
pled over the same data span. Each case was simulated 100 
times with the same parameters, but different realizations 
of the red and white noise. For each realization we fitted 
for the standard pulsar timing model parameters and for 
comparison recorded the pulsar's right ascension, a, proper 
motion in right ascension, ^a, parallax, n, and the size of 
the jump. 



5.1 Bias in the Fourier method 

We found that the fitwaves algorithm was significantly bi- 
ased, in the sense that the parameter estimates depended 
on the initial conditions. Parameters were biased towards 
zero, i.e., towards their initial conditions. So we ran special 
simulations to test for initial-condition bias for a harmonic 
parameter (proper motion in right ascension) and a broad- 
band parameter (a phase jump) . We ran 100 simulations of 
the same fit with slightly different initial conditions. This 
showed that the WLS, Cholesky and polynomial algorithms 
were unbiased. We compare the results for the Cholesky and 
FITWAVES algorithms in Figure 5. In the top panel (a) the 
results for a phase jump are overplotted. In this case the 
FITWAVES result has very small error bars but it is 100% cor- 
related with the initial condition. In the second and third 
panels (b) and (c) the results for proper motion in a axe 
shown. Here one can see that the error bars for the fit- 
waves results are half those for the Cholesky results, but 
the FITWAVES results are heavily biased. This bias applies to 
all FITWAVES fits and therefore this technique should not be 
used for parameter fitting without making a careful study 
to confirm that it is unbiased in the application of interest. 
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Figure 5. Analysis of the bias found when using the FITWAVES 
algorithm. The parameter estimates for a jump are shown in panel 
(a). Here the Cholesky points are marked with circles and error 
bars. The FITWAVES points are marked with error bars but the bars 
are so small that the estimates almost appear to be a continuous 
diagonal line. The parameter estimates for proper motion in RA 
/la are shown in panels (b) and (c) with the same symbols. In all 
cases a best fit straight line is drawn through the estimates. 



Table 1. The ratio of the rms parameter variation to the rms 
parameter variation for the regularly sampled Cholesky algorithm 



Regular Sampling 

WLS Poly Choi 



Irregular Sampling 

WLS Poly Choi 







Weak red 


noise 






a 


1.01 


1.02 


1.00 


1.81 


1.50 


1.28 


fJ,ci 


1.20 


1.56 


1.00 


1.48 


1.26 


0.76 


■K 


1.27 


1.69 


1.00 


2.33 


2.15 


1.14 


Jump 


4.46 


3.26 


1.00 


8.45 


16.42 


5.00 






Strong red 


noise 






a 


17.38 


1.53 


1.00 


39.74 


3.31 


2.96 


Ma 


13.32 


1.67 


1.00 


28.92 


3.24 


2.22 


7r 


3.52 


1.17 


1.00 


33.06 


5.02 


2.46 


Jump 


55.99 


2.11 


1.00 


206.02 


23.62 


14.52 



Table 2. The ratio of the rms parameter variation to the esti- 
mated uncertainty from the Tempo2 fit. 



Regular 


Sampling 


Irregular Sampling 




WLS 


Poly 


Choi 


WLS 


Poly 


Choi 






Weak red 


noise 






a 


1.41 


2.70 


1.00 


2.37 


3.30 


1.24 


Mq 


1.52 


3.32 


1.05 


2.06 


2.76 


1.01 


■K 


1.22 


2.76 


1.06 


2.16 


3.34 


1.11 


Jump 


5.44 


3.10 


1.16 


4.48 


3.23 


1.26 






Strong red 


noise 






a 


0.91 


1.96 


0.95 


1.45 


1.20 


0.90 


fJ-a 


0.83 


2.26 


1.14 


1.47 


1.53 


0.93 


7r 


0.22 


1.61 


1.23 


1.49 


2.23 


0.94 


Jump 


9.68 


2.95 


1.16 


7.97 


4.14 


0.95 



5.2 Jumps and time-harmonic parameters 

As the WLS, polynomial and Cholesky methods were un- 
biased we ran the remaining simulations without changing 
the initial conditions. The results are shown in Table [T] 
We found that the primary measure, the rms variation, im- 
proved consistently from WLS, to polynomial, to Cholesky, 
as expected. The improvement was greater when there was 
strong red noise and irregular sampling. To facilitate com- 
parison we have normalised the rms variation to the rms for 
the Cholesky method with regular sampling. One can see 
that the effect of irregular sampling is large for the jump fit. 
This is because the jump is in a data gap, as it often is in 
real observations. 

In Table[2]we show the ratio of the rms of the parameter 
variation in the 100 realisations with the mean parameter 
uncertainty reported by Tempo2. In all cases Cholesky gave 
an accurate estimate of the parameter uncertainties whereas 
the uncertainties reported by WLS and polynomial methods 
were quite unreliable. 



5.3 Estimation of u and i> 

The apparent pulse frequency, ly, can often be determined 
with ten or more decimal places. This precision is required 
when predicting the pulse period and phase for observations 
of the pulsar, but, as the measured pulse frequency depends 
upon many factors including the unknown radial velocity 



of the pulsar, long-term drifts in terrestrial time standards 
and the timing noise, it does not represent the intrinsic 
spin frequency of the pulsar with this accuracy. As v and 
!> are obtained from fitting a quadratic polynomial function 
to the timing residuals they represent the lowest frequen- 
cies (/ < l/Tobs) in the spectrum of the residuals. These 
frequencies are the most difficult to estimate because fit- 
ting the quadratic significantly modifies the residual power 
P{.f ^ l/Tobs)- Furthermore, at the lowest frequencies, it 
can be difficult to distinguish between random variations, 
which should be included in the whitening matrix, and de- 
terministic variations which should be absorbed in the tim- 
ing model. For instance, "glitch" events during which the 
pulsar's rotation rate suddenly increases should be included 
as part of the timing model (e.g., Wang et al. 2000). In Ta- 
bles |3] and |4] we show the results of our simulations for v and 
i>. The Cholesky method provides 3 or 4 times better param- 
eter accuracy than a standard WLS, and the error estimates 
for the standard WLS are more than an order of magnitude 
too low. Unfortunately the Cholesky error estimates are rea- 
sonably accurate only for regularly sampled observations. 
For irregularly sampled observations the Cholesky error es- 
timates are less reliable and observers will have to simulate 
such observations if the error estimates are important. Tools 
for such simulations are available in Tempo2. 

It is clear from Figure 1 that the WLS and Cholesky 
methods will provide different values u and i> and the resid- 
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Table 3. The ratio of the rnis parameter variation to the rms 
parameter variation for the regularly sampled Cholesky algorithm 



Regular Sampling 

WLS Choi 



Irregular Sampling 

WLS Choi 





Weak red noise 




1.31 


1.00 16.29 


16.20 


1.04 


1.00 3.84 


3.81 




Strong red noise 




1.81 


1.00 5.98 


4.92 


2.82 


1.00 3.36 


2.28 



Table 4. The ratio of the rms parameter variation to the esti- 
mated uncertainty from the Tempo2 fit. 



Regular 

WLS 


Sampling Irregular Sampling 

Choi WLS Choi 


u 7.08 
u 7.44 


Weak red noise 
0.96 64.06 15.30 
1.45 44.56 13.27 


u 17.35 
V 20.23 


Strong red noise 
1.19 31.01 3.06 
1.51 29.07 3.37 



uals will be different. The statistical uncertainty will be sig- 
nificantly smaller with the Cholesky method, but this is not 
the most important aspect of the analysis. If the results 
are to be used to interpolate the phase for comparison with 
other observations (for example X-ray or gamma-ray obser- 
vations) then the timing noise must be modeled, interpo- 
lated, and added to the timing model. This modeling can be 
done with the fitwaves procedure. This should always be 
done with pulsars that have steep red timing noise, regard- 
less of whether the fit for v and i> was done with the WLS 
or Cholesky algorithm. Currently Tempo2 does not include 
a straightforward procedure to extrapolate the phase (for 
instance to prepare for future observations) if the timing 
residuals are significantly affected by a steep red noise pro- 
cess. Since one has a statistical model of both the timing 
noise and the sampling noise, it would probably be wise to 
use a Wiener filter for both interpolation and extrapolation. 



6 ROBUSTNESS TO COVARIANCE ERRORS 

The Cholesky method is optimal only if the covariance ma- 
trix is known, so it is important to establish its sensitivity 
to deviations of the covariance matrix used to obtain the 
whitening transformation, from the true covariance matrix 
of the observations. As the greatest advantage in using the 
Cholesky method is for steep power-law timing noise, we 
have simulated this case with variations in the parameters 
of the spectral model. The simulated spectrum had a power 
law exponent of —5.5 and a corner frequency of 0.07 y'^. 
We adjusted the three independent parameters of the fit; 
the spectral exponent; the corner frequency; and the ratio 
of the white noise to the timing noise. We chose to use one 
of the harmonic parameters, the proper motion in right as- 
cension, as the test case, with ^a = lOmasy"^. The results 
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Figure 6. The robustness of the Cholesky parameter estimates 
to errors in estimating the covariance matrix of the residuals is 
shown using ^a as an example. The results come from 100 sim- 
ulated realisations of a steep red spectrum with white noise. In 
panel (a) the effect of an error in the spectral exponent is shown. 
In panel (b) the effect of an error in the corner frequency is shown. 
In panel (c) the effect of an error in the white noise level is shown. 
In each case the mean error on the parameter is shown with the 
right-facing error bar. The rms of the parameter values is shown 
with the left-facing error bar. 



are shown in Figure 6 as the observed and predicted Icr- 
uncertainties versus the parameter. 

One can see that in most cases the predicted uncertain- 
ties slightly exceed the actual rms parameter variation. The 
cases where this difference is inverted are when the corner 
frequency is much less than the recommended minimum. 
The parameter estimate remains unbiased for all cases sim- 
ulated. Generally the uncertainties are not increased signif- 
icantly when the spectral exponent changes from 4 to 8; 
when the corner frequency is increased or decreased by a 
factor of two; or when the white noise variance is increased 
or decreased by a factor of 4. These are drastic variations 
which are larger than those expected in practice. 

These simulations also indirectly test the effects of 
breakdown in the wide-sense stationarity assumption due to 
estimating the covariance from the post-fit timing residuals 
rather than the pre-fit timing residuals. The most impor- 
tant effect here is removing a quadratic by fitting i/ and 
which removes low frequency power and weakens the wide- 
sense stationarity assumption. We have tested a very wide 
range of variation in the low frequency power by changing 
the spectral exponent. The shape of the spectral model was 
normalized so the power was independent of the exponent 
at a frequency of one cycle per year. Thus power at the low- 
est frequencies was changed by factors of 0.01 to 100 with 
negligible change in the estimated proper motion. 



7 SUMMARY 

In the presence of red timing noise, we recommend that 
pulsar timing analysis always be done using the Cholesky 
method. Even under extreme conditions of very steep spec- 
tra, very irregular sampling, and highly variable errors it 
is possible to obtain an adequate estimate of the covari- 
ance function of the residuals. The Cholesky method will 
provide reliable error estimates except for the parameters v 
and v and it will significantly improve the accuracy of the 
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parameters when the residuals are very red. The Cholesky 
method will also be useful in combined analyses, such as us- 
ing multi-frequency observations to estimate the dispersion 
measure, using multiple pulsars to estimate clock errors or 
in searching for the existence of gravitational waves, because 
it will properly normalise the different frequencies and dif- 
ferent pulsars. The Cholesky method also provides an excel- 
lent power spectral estimate under conditions where spec- 
tral leakage would normally be a problem, i.e., steep spectra, 
high dynamic range, irregular sampling and variable errors. 
This aspect of the method has much broader application 
than to pulsar timing alone. 
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